BIT and MSRA at TREC KBA CCR Track 2013

نویسندگان

  • Jingang Wang
  • Dandan Song
  • Lejian Liao
  • Chin-Yew Lin
چکیده

Our strategy for TREC KBA CCR track is to first retrieve as many vital or documents as possible and then apply more sophisticated classification and ranking methods to differentiate vital from useful documents. We submitted 10 runs generated by 3 approaches: question expansion, classification and learning to rank. Query expansion is an unsupervised baseline, in which we combine entities’ names and their related entities’ names as phrase queries to retrieve relevant documents. This baseline outperforms the overall median and mean submissions. The system performance is further improved by supervised classification and learning to rank methods. We mainly exploit three kinds of external resources to construct the features in supervised learning: (i) entry pages of Wikipedia entities or profile pages of Twitter entities, (ii) existing citations in the Wikipedia page of an entity, and (iii) burst of Wikipedia page views of an entity. In vital + useful task, one of our ranking-based methods achieves the best result among all participants. In vital only task, one of our classification-based methods achieve the overall best result.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Pattern Matching Approach to Streaming Slot Filling

In this paper, we described our system for Knowledge Base Acceleration (KBA) Track at TREC 2013. The KBA Track has two tasks, CCR and SSF. Our approach consists of two major steps: selecting documents and extracting slot values. Selecting documents is to look for and save the documents that mention the entities of interest. The second step involves with generating seed patterns to extract the s...

متن کامل

Filtering Documents over Time on Evolving Topics - The University of Amsterdam at TREC 2013 KBA CCR

In this paper we describe the University of Amsterdam’s approach to the TREC 2013 Knowledge Base Acceleration (KBA) Cumulative Citation Recommendation (CCR) track. The task is to filter a stream of documents for documents relevant to a given set of entities. We model the task as a multi-class classification task. Entities may evolve over time and the classifier should be able to adapt to these ...

متن کامل

BIT and Purdue at TREC-KBA-CCR Track 2014

This report summarizes our participation at KBA-CCR track in TREC 2014. Our submissions are generated in two steps: (1) Filtering a candidate documents collection from the stream corpus for a set of target entities; and (2) Estimating the relevance levels between candidate documents and target entities. Three kinds of approaches are employed in the second step, including query expansion, classi...

متن کامل

A Related Entity based Approach for Knowledge Base Acceleration

In this paper we present the overview of our work in the TREC 2013 KBA Track. The goal is to find documents which may contribute to the update of knowledge base entries (e.g., Wikipedia or Freebase articles). Two tasks are introduced in this year’s track: (1) Cumulative Citation Recommendation (CCR), (2) Streaming Slot Filling (SSF). Particularly, we focus on the CCR task, follow our previous w...

متن کامل

Filtering Entity Centric Documents using Numerics and Temporals features within RF Classifier

This article introduces the system designed to work for the Knowledge Base Acceleration (KBA) track at Text REtrieval Conference (TREC). This is the second time this track is run with yet this year there is even more challenge. KBA is focused on keeping up to date knowledge bases (KB) such as Wikipedia1 (WP) where each KB node is considered as an entity (WP page for wikipedia example). It has b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013